Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System
نویسندگان
چکیده
This paper aims to implement a Support Vector Machines (SVMs) based text classification system for Arabic language articles. This classifier uses CHI square method as a feature selection method in the pre-processing step of the Text Classification system design procedure. Comparing to other classification methods, our system shows a high classification effectiveness for Arabic data set in term of F-measure (F=88.11).
منابع مشابه
Naïve Bayesian Based on Chi Square to Categorize Arabic Data
Text classification is a supervised technique that uses labelled training data to learn the classification system and then automatically classifies the remaining text using the learned system. This paper investigates Naïve Bayesian algorithm based on Chi Square features selection method. The base of our comparisons are macro F1, macro recall and macro precision evaluation measures. The experime...
متن کاملSupport Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study
Feature selection is essential for effective and accurate text classification systems. This paper investigates the effectiveness of six commonly used feature selection methods, Evaluation used an in-house collected Arabic text classification corpus, and classification is based on Support Vector Machine Classifier. The experimental results are presented in terms of precision, recall and Macroave...
متن کاملA Comparative Study with Different Feature Selection For Arabic Text Categorization
Feature Selection benefits a learner by eliminating non-informative or noisy features and by reducing the overall feature space to a manageable size. The Term Feature Selection is used in Machine Learning for the process of selecting a subset of features used to represent the text. In this paper, we propose a new approach for Text Representation based on incorporating background Knowledge Arabi...
متن کاملThe Use of Topic Representative Words in Text Categorization
We present a novel way to identify the representative words that are able to capture the topic of documents for use in text categorization. Our intuition is that not all word n-grams equally represent the topic of a document, and thus using all of them can potentially dilute the feature space. Hence, our aim is to investigate methods for identifying good indexing words, and empirically evaluate...
متن کاملArabic Text Classification Algorithm using TFIDF and Chi Square Measurements
Text categorization is the process of classifying documents into a predefined set of categories based on its contents of keywords. Text classification is an extended type of text categorization where the text is further categorized into sub-categories. Many algorithms have been proposed and implemented to solve the problem of English text categorization and classification. However, few studies ...
متن کامل